20 research outputs found
Recommended from our members
A general approach to temporal reasoning about action and change
Reasoning about actions and change based on common sense knowledge is one of the most important and difficult tasks in the artificial intelligence research area. A series of such tasks are identified which motivate the consideration and application of reasoning formalisms. There follows a discussion of the broad issues involved in modelling time and constructing a logical language. In general, worlds change over time. To model the dynamic world, the ability to predict what the state of the world will be after the execution of a particular sequence of actions, which take time and to explain how some given state change came about, i.e. the causality are basic requirements of any autonomous rational agent.
The research work presented herein addresses some of the fundamental concepts and the relative issues in formal reasoning about actions and change. In this thesis, we employ a new time structure, which helps to deal with the so-called intermingling problem and the dividing instant problem. Also, the issue of how to treat the relationship between a time duration and its relative time entity is examined. In addition, some key terms for representing and reasoning about actions and change, such as states, situations, actions and events are formulated. Furthermore, a new formalism for reasoning about change over time is presented. It allows more flexible temporal causal relationships than do other formalisms for reasoning about causal change, such as the situation calculus and the event calculus. It includes effects that start during, immediately after, or some time after their causes, and which end before, simultaneously with, or after their causes. The presented formalism allows the expression of common-sense causal laws at high level. Also, it is shown how these laws can be used to deduce state change over time at low level. Finally, we show that the approach provided here is expressive
A framework for data cleaning in data warehouses
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses
A Rule Based Taxonomy of Dirty Data
There is a growing awareness that high quality of datais a key to today’s business success and that dirty data existingwithin data sources is one of the causes of poor data quality. Toensure high quality data, enterprises need to have a process,methodologies and resources to monitor, analyze and maintainthe quality of data. Nevertheless, research shows that manyenterprises do not pay adequate attention to the existence of dirtydata and have not applied useful methodologies to ensure highquality data for their applications. One of the reasons is a lack ofappreciation of the types and extent of dirty data. In practice,detecting and cleaning all the dirty data that exists in all datasources is quite expensive and unrealistic. The cost of cleaningdirty data needs to be considered for most of enterprises. Thisproblem has not attracted enough attention from researchers. Inthis paper, a rule-based taxonomy of dirty data is developed. Theproposed taxonomy not only provides a mechanism to deal withthis problem but also includes more dirty data types than any ofexisting such taxonomies
A framework for data cleaning in data warehouses
It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses
A Comparison of Techniques for Name Matching
Information explosion is a problem for everyone nowadays. It is a great challenge to all kinds of businesses to maintain high quality of data in their information applications, such as data integration, text and web mining, information retrieval, search engine, etc. In such applications, matching names is one of the popular tasks. There are a number of name matching techniques available. Unfortunately, there is no existing name matching technique that performs the best in all situations. Therefore, a problem that every researcher or a practitioner has to face is how to select an appropriate technique for a given dataset. This paper analyses and evaluates a set of popular name matching techniques on several carefully designed different datasets. The experimental comparison confirms the statement that there is no clear best technique. Some suggestions have been presented, which can be used as guidance for researchers and practitioners to select an appropriate name matching technique in a given dataset
Recommended from our members
A discrete formalism for reasoning about action and change
This paper presents a discrete formalism for temporal reasoning about actions and change, which enjoys an explicit representation of time and action/event occurrences. The formalism allows the expression of truth values for given fluents over various times including nondecomposable points/moments and decomposable intervals. Two major problems which beset most existing interval-based theories of action and change, i.e., the so-called dividing instant problem and the intermingling problem, are absent from this new formalism. The dividing instant problem is overcome by excluding the concepts of ending points of intervals, and the intermingling problem is bypassed by means of characterising the fundamental time structure as a well-ordered discrete set of non-decomposable times (points and moments), from which decomposable intervals are constructed. A comprehensive characterisation about the relationship between the negation of fluents and the negation of involved sentences is formally provided. The formalism provides a flexible expression of temporal relationships between effects and their causal events, including delayed effects of events which remains a problematic question in most existing theories about action and change
Visualization of Online Datasets
As computing technology advances, computers are being used to orchestrate and advance wide spectrums of commercial and personal life, information visualization becomes even more significant as we immerse ourselves into the era of big data, leading to an economy heavily reliant on data mining and precise, meaningful visualizations. However, accuracy of information visualization techniques is heavily dependent on the knowledge and capabilities of users, leaving novices in many fields at a disadvantage. This is a challenging problem that has been inadequately addressed regardless of the influx in visualization tools. Therefore, this paper proposes a novel approach with a focus on online datasets, allowing users to automatically and accurately visualize datasets. Experiment results show that using a browser extension and specially created HTML tables containing custom attributes - stating the data attribute type - the approach is able to detect and present the most suitable visualizations at the click of a mouse. This proposed approach provides a means for novices to quickly and accurately visualize online datasets
A comparison of techniques for name matching
Information explosion is a problem for everyone nowadays. It is a great challenge to all kinds of businesses to maintain high quality of data in their information applications, such as data integration, text and web mining, information retrieval, search engine, etc. In such applications, matching names is one of the popular tasks. There are a number of name matching techniques available. Unfortunately, there is no existing name matching technique that performs the best in all situations. Therefore, a problem that every researcher or a practitioner has to face is how to select an appropriate technique for a given dataset. This paper analyses and evaluates a set of popular name matching techniques on several carefully designed different datasets. The experimental comparison confirms the statement that there is no clear best technique. Some suggestions have been presented, which can be used as guidance for researchers and practitioners to select an appropriate name matching technique in a given dataset
A rule based taxonomy of dirty data.
There is a growing awareness that high quality of data is a key to today’s business success and that dirty data existing within data sources is one of the causes of poor data quality. To ensure high quality data, enterprises need to have a process, methodologies and resources to monitor and analyze the quality of data, methodologies for preventing and/or detecting and repairing dirty data. Nevertheless, research shows that many enterprises do not pay adequate attention to the existence of dirty data and have not applied useful methodologies to ensure high quality data for their applications. One of the reasons is a lack of appreciation of the types and extent of dirty data. In practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of enterprises. This problem has not attracted enough attention from researchers. In this paper, a rule-based taxonomy of dirty data is developed. The proposed taxonomy not only provides a mechanism to deal with this problem but also includes more dirty data types than any of existing such taxonomies